On Automatic Plagiarism Detection Based on n-Grams Comparison
نویسندگان
چکیده
When automatic plagiarism detection is carried out considering a reference corpus, a suspicious text is compared to a set of original documents in order to relate the plagiarised text fragments to their potential source. One of the biggest difficulties in this task is to locate plagiarised fragments that have been modified (by rewording, insertion or deletion, for example) from the source text. The definition of proper text chunks as comparison units of the suspicious and original texts is crucial for the success of this kind of applications. Our experiments with the METER corpus show that the best results are obtained when considering low level word n-grams comparisons (n = {2, 3}).
منابع مشابه
Monolingual and Crosslingual Plagiarism Detection
Automatic plagiarism detection considering a reference corpus compares a suspicious text to a set of documents in order to relate the plagiarised fragments to their potential source. The suspicious and source documents can be written wether in the same language (monolingual) or in different languages (crosslingual). In the context of the Ph. D., our work has been focused on both monolingual and...
متن کاملPlagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...
متن کاملWord Length n-Grams for Text Re-use Detection
The automatic detection of shared content in written documents –which includes text reuse and its unacknowledged commitment, plagiarism– has become an important problem in Information Retrieval. This task requires exhaustive comparison of texts in order to determine how similar they are. However, such comparison is impossible in those cases where the amount of documents is too high. Therefore, ...
متن کاملN-gram Overlap in Automatic Detection of Document Derivation
Establishing authenticity and independence of documents in relation to others is not a new problem, but in the era of hyper production of e-text it certainly gained even more importance. There is an increased need for automatic methods for determining originality of documents in a digital environment. The method of n-gram overlap is only one of several methods proposed by the literature and is ...
متن کاملAutomatic Plagiarism Detection Using Word-Sentence Based S-gram
Plagiarism is an academic problem that is caught more and more each year. Common tricks that the cheaters normally use is inserting and removing a few extra terms, sentences, or paragraph to the original copy to trick the reader that the plagiarist copy and the original copy are unalike. This paper provides a new way to detect the plagiarism by checking the similarity between sentences, and par...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009